Goto

Collaborating Authors

 shakespeare dataset


Contents of the Appendix

Neural Information Processing Systems

A.1 CIFAR-10 dataset Figure 6 displays test accuracy curves for all six backbone algorithms under three distinct imbalance parameters: 2{ 0.3,1,10}. The results clearly demonstrate that FedNAR outperforms the baselines, particularly in scenarios with imbalanced data. A.2 Shakespeare dataset The experimental results presented in Figure 7 and 8 showcase the outcomes of experiments performed on the Shakespeare dataset. Six backbone algorithms were utilized, with initial weight decay values selected from {10 3,10 4}. These findings serve as evidence that FedNAR, as an adaptive weight decay scheduling algorithm, exhibits effectiveness across various initial weight decay values.



A Proofs for Fat T ailed Federated Learning

Neural Information Processing Systems

A.1 Proof of FAT-Clipping - PR For notional clarity, we have the following update: Local update: x The first inequality follows from the strongly-convex property, i.e., Assumption 4. (Bounded Stochastic Gradient V ariance) There exists a constant Assumption 5. (Bounded Gradient) There exists a constant We remark that for any stochastic estimator satisfies the above conditions, the above inequalities hold. The proof is the exactly same as that in original proof [18]. Theorem 6. Suppose f is We run a convolutional neural network (CNN) model on CIFAR-10 dataset using FedAvg. CNN architecture is shown in Table 2. To simulate data heterogeneity across clients, we manually The dataset and model are taken from [45]. This implies that the gradient noise is fat-tailed.


Power-of-Two (PoT) Weights in Large Language Models (LLMs)

arXiv.org Artificial Intelligence

Complexity of Neural Networks is increasing rapidly due to the massive increase in model parameters. Specifically, in Large Language Models (LLMs), the number of model parameters has grown exponentially in the past few years, for example, from 1.5 billion parameters in GPT2 to 175 billion in GPT3. This raises a significant challenge for implementation, especially for Edge devices where memory and processing power are very limited. In this work, we investigate reducing LLM complexity with special type of quantization, power of two (PoT), for linear layers weights and transformer tables. PoT not only provides memory reduction but more importantly provides significant computational reduction through converting multiplication to bit shifting. We obtained preliminary results of PoT quantization on Nano-GPT implementation using Shakespeare dataset. We then extended results to 124-M GPT-2 model. The PoT quantization results are shown to be very promising with cross entropy loss degradation $\approx$[1.3-0.88] with number of bits range [4-6] to represent power levels.


Apodotiko: Enabling Efficient Serverless Federated Learning in Heterogeneous Environments

arXiv.org Artificial Intelligence

Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce training costs, and alleviate the complex infrastructure management burden on data holders. However, current serverless FL systems still suffer from the presence of stragglers, i.e., slow clients that impede the collaborative training process. While strategies aimed at mitigating stragglers in these systems have been proposed, they overlook the diverse hardware resource configurations among FL clients. To this end, we present Apodotiko, a novel asynchronous training strategy designed for serverless FL. Our strategy incorporates a scoring mechanism that evaluates each client's hardware capacity and dataset size to intelligently prioritize and select clients for each training round, thereby minimizing the effects of stragglers on system performance. We comprehensively evaluate Apodotiko across diverse datasets, considering a mix of CPU and GPU clients, and compare its performance against five other FL training strategies. Results from our experiments demonstrate that Apodotiko outperforms other FL training strategies, achieving an average speedup of 2.75x and a maximum speedup of 7.03x. Furthermore, our strategy significantly reduces cold starts by a factor of four on average, demonstrating suitability in serverless environments.


Federated Learning with Sparsified Model Perturbation: Improving Accuracy under Client-Level Differential Privacy

arXiv.org Artificial Intelligence

Federated learning (FL) that enables edge devices to collaboratively learn a shared model while keeping their training data locally has received great attention recently and can protect privacy in comparison with the traditional centralized learning paradigm. However, sensitive information about the training data can still be inferred from model parameters shared in FL. Differential privacy (DP) is the state-of-the-art technique to defend against those attacks. The key challenge to achieving DP in FL lies in the adverse impact of DP noise on model accuracy, particularly for deep learning models with large numbers of parameters. This paper develops a novel differentially-private FL scheme named Fed-SMP that provides a client-level DP guarantee while maintaining high model accuracy. To mitigate the impact of privacy protection on model accuracy, Fed-SMP leverages a new technique called Sparsified Model Perturbation (SMP) where local models are sparsified first before being perturbed by Gaussian noise. We provide a tight end-to-end privacy analysis for Fed-SMP using Renyi DP and prove the convergence of Fed-SMP with both unbiased and biased sparsifications. Extensive experiments on real-world datasets are conducted to demonstrate the effectiveness of Fed-SMP in improving model accuracy with the same DP guarantee and saving communication cost simultaneously.


Communication-Efficient Federated Learning via Optimal Client Sampling

arXiv.org Machine Learning

Federated learning (FL) ameliorates privacy concerns in settings where a central server coordinates learning from data distributed across many clients. The clients train locally and communicate the models they learn to the server; aggregation of local models requires frequent communication of large amounts of information between the clients and the central server. We propose a novel, simple and efficient way of updating the central model in communication-constrained settings based on collecting models from clients with informative updates and estimating local updates that were not communicated. In particular, modeling the progression of model's weights by an Ornstein-Uhlenbeck process allows us to derive an optimal sampling strategy for selecting a subset of clients with significant weight updates. The central server collects updated local models from only the selected clients and combines them with estimated model updates of the clients that were not selected for communication. We test this policy on a synthetic dataset for logistic regression and two FL benchmarks, namely, a classification task on EMNIST and a realistic language modeling task using the Shakespeare dataset. The results demonstrate that the proposed framework provides significant reduction in communication while maintaining competitive or achieving superior performance compared to a baseline. Our method represents a new line of strategies for communication-efficient FL that is orthogonal to the existing user-local methods such as quantization or sparsification, thus complementing rather than aiming to replace those existing methods.


LEAF: A Benchmark for Federated Settings

arXiv.org Machine Learning

Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, learning in federated settings presents new challenges at all stages of the machine learning pipeline. As the machine learning community begins to tackle these challenges, we are at a critical time to ensure that developments made in this area are grounded in real-world assumptions. To this end, we propose LEAF, a modular benchmarking framework for learning in federated settings. LEAF includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.